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Abstract 

The integration of intermittent and volatile renewable energy resources requires increased flexibility 
in the operation of the electric grid. Storage, broadly speaking, provides the flexibility of shifting energy 
over time; network, on the other hand, provides the flexibility of shifting energy over geographical 
locations. The optimal control of general storage networks in uncertain environments is an important 
open problem. The key challenge is that, even in small networks, the corresponding constrained stochastic 
control problems with continuous spaces suffer from curses of dimensionality, and are intractable in 
general settings. For large networks, no efficient algorithm is known to give optimal or near-optimal 
performance. 

This paper provides an efficient and provably near-optimal algorithm to solve this problem in a very 
general setting. We study the optimal control of generalized storage networks, i.e., electric networks 
connected to distributed generalized storages. Here generalized storage is a unifying dynamic model for 
many components of the grid that provide the functionality of shifting energy over time, ranging from 
standard energy storage devices to deferrable or thermostatically controlled loads. An online algorithm 
is devised for the corresponding constrained stochastic control problem based on the theory of Lyapunov 
optimization. We prove that the algorithm is near-optimal, and construct a semidefinite program to min¬ 
imize the sub-optimality bound. The resulting bound is a constant that depends only on the parameters 
of the storage network and cost functions, and is independent of uncertainty realizations. Numerical 
examples are given to demonstrate the effectiveness of the algorithm. 


1 Introduction 

To ensure a sustainable energy future, deep penetration of renewable energy generation is essential. Re¬ 
newable energy resources, such as wind and solar, are intrinsically variable. Uncertainties associated with 
these intermittent and volatile resources pose a significant challenge to their integration into the existing 
grid infrastructure [4]. More flexibility, especially in shifting energy supply and/or demand across time and 
network, are desired to cope with the increased uncertainties. 

Energy storage provides the functionality of shifting energy across time. A vast array of technologies, 
such as batteries, flywheels, pumped-hydro, and compressed air energy storages, are available for such a 
purpose [5,6]. Furthermore, flexible or controllable demand provides another ubiquitous source of storage. 
Deferrable loads - including many thermal loads, loads of internet data-centers and loads corresponding to 
charging electric vehicles (EVs) over certain time interval [7] can be interpreted as storage of demand [8]. 
Other controllable loads which can possibly be shifted to an earlier or later time, such as thermostatically 
controlled loads (TCLs), may be modeled and controlled as a storage with negative lower bound and positive 
upper bound on the storage level [9]. These forms of storage enable inter-temporal shifting of excess energy 
supply and/or demand, and significantly reduce the reserve requirement and thus system costs. 

*This report, written in January 2014, is a longer version of the conference paper [1]. An extended treatment of the single 
storage case be found at [2]. An improved algorithm for the networked case and its distributed implementation can be found 
at [3]. This version contains a somewhat more general treatment for the cases with sub-differentiable objective functions and 
Markov disturbance. 
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Email: {jqin, ychow, jiyan}@stanford.edu. R. Rajagopal is with Department of Civil and Environmental Engineering, 
Stanford, CA, 94305. Email: ramr@stanford.edu. 
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On the other hand, shifting energy across a network, i.e., moving excess energy supply to meet unfulfilled 
demand among different geographical locations with transmission or distribution lines, can achieve similar 
effects in reducing the reserve requirement for the system. Thus in practice, it is natural to consider these 
two effects together. Yet, it remains mathematically challenging to formulate a sound and tractable problem 
that accounts for these effects in electric grid operations. Specifically, due to the power flow and network 
constraints, control variables in connected buses are coupled. Due to the storage constraints, control variables 
in different time periods are coupled as well. On top of that, uncertainties associated with stochastic 
generation and demand dramatically complicate the problem, because of the large number of recourse stages 
and the need to account for all probable realizations. 

Two categories of approaches have been proposed in the literature. The first category is based on 
exploiting structures of specific problem instances, usually using dynamic programming. These structural 
results are valuable in providing insights about the system, and often lead to analytical solution of these 
problem instances. However, such approaches rely heavily on specific assumptions of the type of storage, 
the form of the cost function, and the distribution of uncertain parameters. Generalizing results to other 
specifications and more complex settings is usually difficult, and consequently this approach is mostly used to 
analyze single storage systems. For instance, analytical solutions to optimal storage arbitrage with stochastic 
price have been derived in [10] without storage ramping constraints, and in [11] with ramping constraints. 
Problems of using energy storage to minimize energy imbalance are studied in various contexts; see [12-15] 
for reducing reserve energy requirements in power system dispatch, [16,17] for operating storage co-located 
with a wind farm, [18,19] for operating storage co-located with end-user demands, and [20] for storage with 
demand response. 

The other category is to use heuristic algorithms, such as Model Predictive Control (MPC) [21] and look¬ 
ahead policies [22], to identify sub-optimal storage control rules. Usually based on deterministic (convex) 
optimization, these approaches can be easily applied to general networks. The major drawback is that these 
approaches usually do not have any performance guarantee. Consequently, it lacks theoretical justification 
for implementing them in real systems. Examples of this category can be found in [21] and references therein. 

This work aims to bring together the best of both worlds, i.e., to design online deterministic optimizations 
that solve the stochastic control problem with provable guarantees. It contributes to the existing literature 
in the following ways. First, we formalize the notion of generalized storage as a dynamic model that captures 
a variety of power system components which provide the functionality of storage. Second, we formulate 
the problem of storage network operation as a stochastic control problem with general cost functions, and 
provide examples of applications that can be encapsulated by such a formulation. Third, we devise an 
online algorithm for the problem based on the theory of Lyapunov optimization 1 , and prove guarantees 
for its performance in terms of a bound of its sub-optimality. We also show that the bound is independent 
of the realizations of the uncertain parameters. The bound is useful not only in assessing the worst-case 
performance of our algorithm, but also in evaluating the performance of other sub-optimal algorithms when 
the optimal costs are hard to obtain. It can also be used to estimate the maximum cost reduction that can 
be achieved by any storage operation, thus provides understanding for the limit of a certain storage system. 
To the best of our knowledge, this is the first algorithm with provable guarantees for the storage operation 
problem with general electric networks. 

Our methodology is closely related to that of [19], where the focus is on solving the problem of operating 
an idealized energy storage (with no energy dissipation over time, and no charging/discharging conversion 
loss) at data-centers. Our objective is to provide an algorithm to operate generalized storage network in a 
wide range of different settings. This requires an extended or a new analysis in the following aspects. From 
the modeling perspective, in order to capture applications such as deferrable loads and TCLs, we do not 
assume storage level is non-negative, instead, we only assume each storage is feasible (see Assumption 2.1 
for more details). Furthermore, modeling the dissipation of energy over time leads to a new sub-optimality 
bound; the bound in [19] becomes a special case of our bound when the dissipation factor (or storage 
efficiency) is one. A semidefinite program is constructed to decide parameters of the algorithm in order to 

1 Although closely related to the classical Lyapunov theory for stability, the theory and techniques of Lyapunov optimization 
are relatively recent. See [23] for more details. 
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minimize the sub-optimality bound. Finally, the aspect of power network appears to be completely new. 

The rest of the paper is organized as follows. Section 2 formulates the problem of operating a generalized 
storage network under uncertainty. Section 3 gives the online algorithm and states the performance guar¬ 
antee. Section 4 analyzes the single bus case in detail with a generalized storage, and Section 5 provides a 
summary of results for general storage networks. Numerical examples are then given in Section 6. Section 
7 concludes the paper. 


2 Problem Formulation 

2.1 Generalized Storage Models 

We start by defining a generalized storage model for each fixed bus of the electric network. A diagram is 
shown in Figure 1. Such a model may be used for a single bus system by setting the network inflow to be 
zero, or as a component of an electric network as discussed in Section 2.3. We work with a slotted time 
model, where t is used as the index for an arbitrary time period. Given that the actual length of each time 
interval is constant, this allows for simple conversion from power units (e.g., MW) to energy units {e.g., 
MWh) and vice versa. Thus we assume all quantities under consideration in this paper are in energy units, 
albeit many power system quantities are conventionally specified in power units. 


Network inflow 


Energy 

imbalance 



Residual 

energy 

imbalance 


Figure 1: Diagram of a single-bus storage system. 

For the bus under consideration and time period t, the local energy imbalance 5(t) is defined to be the 
difference between the local generation and demand. Both the local generation and demand can be stochastic, 
and therefore 5(t) is stochastic in general. The bus may be connected to other parts of the network, whose 
net energy inflow is denoted by fit). The bus is also connected to a generalized storage, which is specified 
by the following elements: 

• The storage level or State of Charge (SoC) s{t ) summarizes the status of the storage at time period t. 
If s(t) > 0, it represents the amount of energy in storage; if s(t) < 0, —b{t) can represent the amount 
of currently deferred (and not fulfilled) demand. It satisfies s{t) £ [S min , S' max ], where S max is the 
storage capacity, and S' 111111 is the minimum allowed storage level. 

• The storage operation u{t) summarizes the charging (when u(t) > 0) and discharging (when u(t) < 
0) operations of the storage. It satisfies charging and discharging ramping constraints, i.e., u{t) £ 
[£/ min , [7 max ], where £/ mm < 0 whose magnitude is the maximum discharge within each time period, 
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and C/ max > 0 is the maximum charge within each time period. We also use u + (t) = max(u(f), 0) and 
u~{t) = max(— u(t), 0) to denote the charging and discharging operations respectively. 

• The storage conversion function h maps the storage operation u(t) into its effect on the bus. In 
particular, it is composed of two linear functions, namely the charging conversion function h c , and the 
discharging conversion function hP , such that the quantity h c (u + (t)) is the amount of energy that is 
withdrawn from the bus due to u + (t) amount of charge, and /i D (u _ (f)) is the amount of energy that 
is injected into the bus due to u~ (t) amount of discharge, whence 

h(u(t)) = h D (u~(t)) — h c (u + (t )) 


is the net energy injection into the bus. 

• The storage dynamics is then 

s(t + 1) = Xs(t) + u{t), (1) 

where A £ (0,1] is the storage efficiency which models the loss over time even if there is no storage 
operation. 

The storage parameters satisfy the following consistency conditions. 

Assumption 2.1 (Feasibility). Starting from any feasible storage level, there exists a feasible storage oper¬ 
ation such that the storage level in the next time period is feasible, that is 


1. A S min + £/ max > S min 


2. A5' max + [/ min 


^max 


The residual energy imbalance, after accounting for the network inflow and storage operation, is then 
given by: 

S(t) + h{u{t )) + fit) = 8{t) - h c [u + (t)) + h D {u~{t)) + /(*). (2) 

We give a few examples of generalized storage models as follows. 

Example 2.2 (Storage of energy). Storage of energy can be modeled as a generalized storage with S' max > 
S mm > o. Here t/ mm and t/ max correspond to the power rating of the storage, up to a multiple of the length 
of each time period. By setting h c (u + (t )) = (1 / p c )u + (t), and h D (u~(t)) = pPu~(t), one models the energy 
loss during charging and discharging operations. Here p G £ (0,1] is the charging efficiency; pP £ (0,1] is 
the discharging efficiency; and the round-trip efficiency of the energy storage is p G fP > . For instance, based 
on the information from [24], a NaS (sodium sulfur) battery can be modeled with parameters: 


(S min , S max , t/ min , £/ max , p c , p D , A) = (0 MWh, 100 MWh, 
- 10 MWx lh,10MWx 1/1,0.85,0.85,0.97), 


and a CAES (compressed air energy storage) can be modeled with parameters: 

(S min , 5 max , U min , t/ max , p c ,p°, A) = (0 MWh, 3000 MWh, 
- 300MWx Ih, 300MWx Ih, 0.85,0.85,1.00). 


Example 2.3 (Storage of demand). Pre-emptive deferrable loads may be modeled as storage of demand, 
with —s(t) corresponding to the accumulated deferred (but not yet fulfilled) load up to time t , and with u(t) 
corresponding to the amount of load to defer/fulfill in time period t. We have S m ' n < S max < 0 in this case. 
Storage of demand differs from storage of energy in the sense that it has to be discharged before charging is 
allowed. The conversion function can usually be set to h(u(t)) = u(t), and generally A = 1 in deferrable load 
related applications. 
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Example 2.4 (Generalized battery models). It is shown recently that an aggregation of TCLs may he 
modeled as a generalized battery [9]. A discrete time version of such a model can be cast into our framework 
by setting S min = — S' max and S' max > 0. Other storage parameters can be set properly according to Definition 
1 of [9], and we have A < 1 to model energy dissipation. 

We consider the following stochastic piecewise linear cost function for each fixed bus 

L 

g(t) = ^2p(t,£)(a\£)6{t) - a G (£)h c {u + (t)) (3) 

£=1 

+ a B (£)h B (u~(t)) + a ¥ {l)f(t) + a Gonst (t, £)) + , 

where the parameter p{t,£) is in general stochastic, and follows a prescribed probability law, and a s (£), 
a c (£), a B (£), a F (£) and a Const (t,£) are constants, for each £ = 1 and t. This cost function serves 

as a generalization of positive (and/or negative) part cost function of the residual energy imbalance, and it 
encapsulates many applications of storage as shown in Section 2.2. Our analysis applies to a more general 
class of cost functions; see Appendix B for more details. 


2.2 Applications in Single Bus Systems 

The storage operation problem on a single bus system ( f(t ) = 0) can be posed as an infinite horizon average 
cost stochastic control problem as follows: 


minimize lim —IE 

T->oo T 

subject to s(t + 1) = A s(t) + u(t), 
S min <s(t)< 5 ,max , 
Z7 min < u(t) < t/ max , 




(4a) 

(4b) 

(4c) 

(4d) 


where we aim to find a control policy that maps the state s(t) to storage operation u(t), minimizes the 
expected average cost and satisfies all constraints for each time period t. Here, the initial state s(l) € 
[5 min , 5 max ] is given. 

Combining some specific cases of the generalized storage model given in Examples 2.2-2.4 with properly 
defined cost functions leads to possible problem instances of optimal control of storage under uncertainty. 
Here we provide examples that are considered in the literature. 

Example 2.5 (Balancing). Storage may be used to minimize residual energy imbalance given some stochastic 
{S(t) : t > 1} process. Typical cost functions penalize the positive and negative residual energy imbalance 
differently, and may have different penalties at different time periods . (For example, to model the different 
consequences of load shedding at different times of the day.) The problem of optimal storage control for such 
a purpose can be modeled by problem (4) with the cost function 

g (t) =q + {t) (S(t) - h c [u + {t )) + h D (u"(f))) + 

+ q~(t) (6(t) - h G (u + (t)) + h D (u“(f))) 

=q + {t) (S(t) - h c (u + {t)) + h°(u~(t))) + 

+ Q~(t) + h G (u + (t)) - h B {u~(t))) + , 

where q + (t) and q~(t) are the penalties 2 for each unit of positive and negative residual energy imbalance at 
time period t, respectively. 

2 These penalties are usually prescribed deterministic sequences [121. 
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Example 2.6 (Arbitrage). Given that the locational marginal prices {p LMP (t) : t > 1} are stochastic, a 
storage may be used to exploit arbitrage opportunities in electricity markets. The problem of maximizing the 
expected arbitrage profit using storage operations can be cast as an instance of (4), with the cost function 
(i.e., negative profit) given by: 


g(t)=p LMP (t)(h c (u + (t))-h»(u-m 
=p LMP (t) ( h c (u + (t))~h»(u~(t))) + 

-p LMP (t) (-h°(u+(t)) + h»(u~(t))) + . 

Example 2.7 (Storage co-located with a stochastic generation or demand). Applications of this type may 
be cast into our framework using {(5(f) : t > 1} to model the stochastic generation or demand process, and 
{p(t,£) : t > 1} to model the stochastic prices. A possible cost function is 

gif) = p LMP (t) {-6{t) + h c {u + (t)) - h D (u-(t))) + , 

where the residual energy is curtailed with no cost/benefit, and the residual demand is supplied via buying 
energy from the market at stochastic price p LMP (f) . 

2.3 Network Models 

The electric network can be modeled as a directed graph G{V,E). Let n = \V\, m = \E\, and E R be the 
edge set with all edges reversed. We use the notation e ~ v to indicate that e £ {{v',v) £ E U E R : v' £ V}. 
Each edge models a transmission (or distribution) line, and is associated with some power flow. Assuming 
the power system is operated in steady state, and the power flow is approximately a constant over each time 
period t, the energy flow through the line can be obtained by multiplying the power flow by the length of 
each time period and is denoted by f e {t) for e £ E, with the direction of the edge indicating the positive 
direction of the flow. 3 The flow vector f(t) £ ]R m satisfies power flow constraints, which can be compactly 
summarized by the following set of linear constraints using the classical DC power flow approximations to 
AC power flow equations [25]: 

f{t) £T, J={feR m : -F“ < f < F max , Kf = 0}, (5) 

where F max £ R m is the vector of the line capacities of the network, and K £ ]R( m - n + 1 ) xm i s a matrix 
summarizing the Kirchhoff’s voltage law. The construction of this K matrix from network topology and line 
parameters can be found in Appendix ANote that additional network constraints may be included in the 
definition of the set T. 

Each node models a bus in the electric network. On bus v £ V, a set of variables as described in 
Section 2.1 is defined, with a subscript v attached to each of the bus variables, and the network inflow is 
replaced by network flows to the bus from incident lines. The cost for bus v and time period t is then given 

by 


Ijv 

9v(t) = ^p v {t,l)(a s v {t)8 v {t) - o%(i)h°{u+{t)) 
i 

{t)h% (■ u~ (t)) +a p {£) ^ f e {t) +a£ onst (i, 


( 6 ) 


3 To lighten the notation, for each e = (vi,V 2 ) £ E , we also define f e /(t) 
V e V, the net inflow ffit) = E e =(„',„) e s ^ + E e =(„', u )gBR fe(t) ■ 


2 —fe(t) for e f = and therefore for all 

^Je=(u / ,v)£E f e (^) £E >/' e (^) - 
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and the networked storage stochastic control problem is defined as follows: 


minimize 
subject to 


lim —E 

T— > oo T 


" T 




Sy{t “b 1) - XySyft) “b u v (t), 

S™ n < s v (t) < 5™ ax , 

u™ n < u v (t) < ur x , 
f ft) e T. 


(7a) 

(7b) 

(7c) 

(7d) 

(7e) 


In this problem, we aim to find a control policy that maps the state s ft) to storage operation u(f) and 
network flow f(f), and minimizes the expected average cost objective function(7a), such that constraints 
(7b)-(7d) hold for each t and v, and (7e) holds for each t. 


3 Online Algorithm and Performance Guarantees 

This paper provides an online algorithm for solving (7) with provable performance guarantees. Here we give 
a preview of the algorithm (Algorithm 1) and its sub-optimality bound (Theorem 3.3). The performance 


Algorithm 1 Online Lyapunov Optimization for Storage Network Control 

Input: (i) Storage specifications (5™ m , S'™ 8 *, Uff ln , t/™ ax , hff, hff, X v ), (ii) cost parameters in g v {t) (in¬ 
cluding an upper bound and a lower bound on the sub-derivative of g v ft) with respect to u v ft), denoted 
by Dg v (t ) and Da„(t), and excluding any information about stochastic parameters S v ft) and p v (t,£), 
£ = 1,..., L v ) for each bus v £ V, and (iii) network parameters K and F max . 

Offline-Phase: Determine algorithmic parameters T„ and W v for each bus v £ V, by solving semidefinite 
program (37). 

Online-Phase: 

for each time period t do 

Each bus v G V observes realizations of stochastic parameters S v (t) and p v {t,£), £= 1 ,,L V . 

Solve the following deterministic optimization for storage operation u(t) and network flow f(f): 


• • • \ A 'j'lly (t) 

minimize 2_^ - 777 -b 9v (' t) 

(8a) 

vGV v 


subject to U min < u(t) < U max , 

(8b) 

f (t) € T. 

(8c) 


end for 


theorem will hold under the following additional technical assumptions. 

Assumption 3.1. For each bus v € V, the range of storage control is smaller than the effective capacity of 
the storage, i.e., U™ ax - U™ in < S™ ax - S™ in . 

Since the bounds for storage control t/™ ax and Uff ln are the product of the power rating of storage (in 
unit MW for example) and the length of each time period, this assumption holds for most systems as long 
as the length of each time period is made small enough. For instance, this assumption is satisfied for both 
energy storage examples in Example 2.2. 

We make the following assumption on the stochastic parameters of the system. 

Assumption 3.2. Let the collection of stochastic parameters be Off) = {5 v ft),p v (t, £),£ = 1 ,,L v ,v £ V}. 
Then one of the following two assumptions is in force: 
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1. 0(t) is independent and identically distributed (i.i.d.) across time t, and is supported on a compact set. 

2. 0(t) is some deterministic function of the system stochastic state co(t), which is supported on an (ar¬ 
bitrarily large) finite set 12, and follows an ergodic Markov Chain. 

The first assumption is used to give a simple proof of the performance theorem and provide insights on 
the construction of our algorithm. The second alternative assumption intends to generalize the performance 
bounds to non-i.i.d. cases. The additional assumptions such as that co(t) lies in a finite set are introduced 
to reduce the required technicality in view of the page limit. The same result can be obtained in a more 
general setting where {9{t) : t > 1} follows a renewal process. The performance bounds in these setting are 
the same, up to a multiple of the mean recurrence time of the stochastic process under consideration. 

Under Assumptions 3.1 and 3.2, the following performance guarantee holds. 

Theorem 3.3. The control actions (u(f), f (t)) generated by Algorithm 1 are feasible for (7) and sub-optimal, 
whose sub-optimality 4, is bounded by a constant that depends only on the parameters of the storages and cost 
functions, and is independent of realizations of the stochastic parameters. 

The precise expressions of the sub-optimality bounds for the single bus case and general network case 
are given in Section 4 and Appendix D, respectively, both under the i.i.d. assumption (Assumption 3.2.1). 
The bounds for settings with the Markov assumption (Assumption 3.2.2) are given in Appendix E. 

Remark 3.4 (Convexity). Our result holds without assuming that g v (t) is convex in u v (t), v £ V. However, 
we do assume the online optimization (8) can be solved efficiently, and in all numerical examples we work 
with convex cost functions. 


4 Analysis for Single Bus Systems 


To demonstrate the proof ideas without unfolding all technicalities, we prove Theorem 3.3 for a single bus 
system under the following simplifying assumptions. 


Assumption 4.1. We assume in this section: 


• the imbalance process (<5(t) : t > 1} is independent and identically distributed (i.i.d.) across t and is 
supported on a compact interval [<5 mirl , <5 max ]; 

• for each t = 1 the process {p(t,£) : t > 1} is i.i.d. across t. and is supported on a compact 

interval [p min (£),p max (£)]. 


Define 


u = lim —E 
T—»■ oo T 




s = lim —E 
T— > oo T 




Note that for s(l) € [S min , 5 max ], 


u = lim —E 
T->oo T 


" T 

E^+i) - As W 

_t =i 


= (1 - A)s. 


As s(t) G [5 min , < 5 max ] for all t > 0, the above expression implies 

(1 - X)S min <u<( l- A )S max . 

4 Here the sub-optimality is defined as the difference between the objective value of (7) with (u(£),f(£)) generated by 
Algorithm 1 and the optimal cost of (7). 









Problem (4) can be equivalently written as follows 


PI: minimize lim —IE 

T—too T 


^9(t) 


subject to s(t + 1) = Xs(t) + u(t), 

S min - A s(t) < u(t ) < S max - A s(t), 
U min < u(t) < t/ max , 

(1 - A )S min < u < (1 - A)S' max , 


(9a) 

(9b) 

(9c) 

(9d) 

(9e) 


where bounds on s(t) are replaced by (9c), and (9e) is added without loss of optimality. 

The proof procedure is depicted in the diagram shown in Figure 2, where we use Jpi(u) to denote the 
objective value of PI with storage operation sequence u (as an abbreviation of {u(t) : t > 1}), u*(Pl) to 
denote an optimal sequence of storage operation for PI, Jp 1 = Jpi(u*(Pl)), and we define similar quantities 
for P2 and P3. Here P2 is an auxilliary problem we construct to bridge the infinite horizon storage control 
problem PI to online Lyapunov optimization problems P3 (8) (or (15) for single storage case). It has the 
following form 


P2: minimize lim —IE 

T—too T 




J=1 


subject to U min < u(t ) < U max , 

(1 - A)S'™' 11 <«<(!- A)5 max 


(10a) 

(10b) 

(10c) 


Notice that it has the same objective as PI, and evidently it is a relaxation of PI. This implies that m*(P2) 
may not be feasible for PI, and 

Jp 2 = Jpi(u*(P2)) < J£ v (11) 

The reason for the removal of state-dependent constraints (9c) (and hence (9b) as the sequence {s(t) : t > 1} 
becomes irrelevant to the optimization of {u(t) : t > 1}) in P2 is that the state-independent problem P2 has 
easy-to-characterize optimal stationary control policies. In particular, from the theory of stochastic network 
optimization [23], the following result holds. 

Lemma 4.2 (Optimality of Stationary Disturbance-Only Policies). Under Assumption 4-1 there exists a sta¬ 
tionary disturbance-onlip policy u stat (f) , satisfying (10b) and (10c), and providing the following guarantees 
for all t: 


(1 - A)S min < E[u stat (f)] < (1 - A)S' max , (12) 

E[ 5 (t)| u (t)= u stat (t)] = J^ 2 , (13) 

where the expectation is taken over the randomization ofS(t), p(t,£), £ = 1,...,L, and (possibly) it stat (t). 

Equation (13) not only assures the storage operation induced by the stationary disturbance-only policy 
achieves the optimal cost, but also guarantees that the expected stage-wise cost is a constant across time t and 
equal to the optimal time average cost. This fact will later be exploited in order to establish the performance 
guarantee of our online algorithm. By the merits of this Lemma, in the sequel, we overload it*(P2) to denote 
the storage operation sequence obtained from an optimal stationary disturbance-only policy. 

An issue with u*(P2) for the original problem is that it may not be feasible for PI. To have the 
{s(t) : t > 1} sequence induced by the storage operation sequence lie in the interval [S min , 5' max ], we 

5 The policy is a pure function (possibly randomized) of the current disturbances S(t) and p(t, £), £ = 1,L. 
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Figure 2: An illustration of the proof procedure as relations between problems considered in Section 4. Here 
S denotes the sub-optimality bound. 


construct a virtual queue related to s(t) and use techniques from Lyapunov optimization to “stabilize” such 
a queue. Let the queueing state be a shifted version of the storage level: 

s(t) = s(t) + T , (14) 

where the shift constant T will be specified later. We wish to minimize the stage-wise cost g(t) and at the 
same time to maintain the queueing state close to zero. This motivates us to consider solving the following 
optimization online (j.e., at the beginning of each time period t after the realizations of stochastic parameters 
p(t, £), £ = 1,... ,L, and S(t) have been observed) 

P3: minimize A s(t)u(t) + Wg(t) (15a) 

subject to U min < u{t ) < £/ max , (15b) 

where the optimization variable is u(t), the stochastic parameters in g(t ) are replaced with their observed 
realizations, and W > 0 is a weight parameter. Note that the objective here is a weighted combination of the 
stage-wise cost and a linear term of u(t), whose coefficient is positive when s(t) is large, and negative when 
s(t) is small. We use the notation u ol (f) for the solution to P3 at time period t, u*(P3) for the sequence 
(u ol (f) : t > 1}, Jp3,t(u(i)) for the objective function of P3 at time period t, and Jp 3 t for the corresponding 
optimal cost. In the rest of this section, we give conditions for parameters T and W such that solving P3 
online will result in a feasible {s(t) : t > 1} sequence (Section 4.1), characterize the sub-optimality of u*(P3) 
as a function of T and W and state the semidefinite program for identifying the optimal T and W pair 
(Section 4.2). 

4.1 Feasibility 

We start with a structural result for the online optimization problem P3. It follows from Lemma B.l which 
is proved for general cost functions in Appendix B. 

Lemma 4.3. At each time period t, the solution to P3, u ol (t), satisfies 

1. u ol (t) = U min whenever Xs(t) > —WDg, 
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2. u ol (t) = [7 max whenever Xs(t) < —WDg, 


where 


and 


Dp = inf | 

f £ d u{t) g{t) 

u(t) £ [U min ,£/ max ], 
p(t,£)£[p™ n (£),p™*(£)],W, 

S(t) £ [6 min , <5 max ], 
f(i) £ F, t > 1 

Dg = sup 

£ e d u ( t )9(t) 

< 

u{t) £ [f7 min ,t/ max ], ] 

p(t,£) £[p min (i),p niax (£)},W, \ 

S(t) £ [5 min ,5 max ], 

f(t) £ T, t > 1 J 


are the greatest lower bound and the least upper bound of the sub-derivatives of g(t), respectively. 


Remark 4.4 (Evaluation of Dg , Do). Any finite lower bound and upper bound for the sub-derivative of the 
cost g(t ) can be used as Da and Dg, respectively. Here we use the greatest lower bound and least upper bound 
to provide the tightest performance bounds. For cases with simple cost functions, e.g., for idealized storage 
with L = 1, Da and Dg can be easily obtained from p min (£), p mayi {£), and constants in the cost function g(t) 
(such as a c (£) and a D (£)J. In cases where g{t) is differentiable with respect to u(t), Dg and Dg may be 
obtained by solving a simple optimization problem. 


This allows us to construct the following sufficient condition that will assure the feasibility of the {s(t) : 
t > 1} sequence induced by u*(P3). 

Theorem 4.5 (Feasibility). Suppose the initial storage level satisfies s(l) £ [5 min ,S max ], then the storage 
level sequence {s(f) : t > 1} induced by the sequence of storage operation u*(P3) is feasible with respect to 
storage level constraints, i.e., s(t) £ [S™ 111 , S' max ] j or ^ ^ provided that 


where 


and 


r 


min 


r 


max 


j-imin jpmax 

(16) 

0 <W < lT max , 

(17) 

i \-WDg + (U max - (1 - A) 1 S' max ) + ] -5 max , 

(18) 

i \-WDg - ((1 - A)5 min - U min ) + ] -S min , 

A 

(19) 


4E maX = : 


1 


Dg-Hq L 


A(S ,max - S min )-((1 - X)S min -U min y 
- ([/ max _ (i _ \)S iaax )+' 


( 20 ) 


Proof. The result is proved by induction, where Lemma 
sequence. See Appendix C for more details. 


4.3 is used to partially characterize the u ol (t) 

□ 


4.2 Performance 

In the previous result, we have established that u* (P3) is feasible for PI as long as parameters T and W 
satisfy (16) and (17). In the next theorem, we characterize the sub-optimality of w*(P3) for fixed T and W. 
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Theorem 4.6 (Performance). The sub-optimality of storage operation it*(P3) is bounded by M(T)/W, that 
is 

< Jpi(u*(P3)) < Jf>i + M(T)/W, (21) 

where 


M(r) = M U (T) + A(1 - \)M S (T), 

M U (T) =imax (( U min + (1 - A)r) 2 , (C7 max + (1 - A)r) 2 ), 

M s ( r) = max ((S min + P) 2 , (S max + T) 2 ) . 

Proof. A quadratic Lyapunov function is constructed. The relation between the Lyapunov drift and the 
objective of P3 is exploited, which in turn relates to the objective of P2 and so PI. Appendix C contains 
the whole proof. □ 

The theorem above guarantees that the worst-case cost (among different uncertainty realizations) of our 
online algorithm is bounded above by Jf> 1 + M(T)/W. The sub-optimality bound M(T)/W reduces to a 
much simpler form if A = 1. 

Remark 4.7 (Sub-Optimality Bound, A = 1). For a storage with A = 1 , we have 

M 4 M(r) = (1/2) max((C/ min ) 2 , (t/ max ) 2 ), 


and the online algorithm is no worse than M/W sub-optimal. In this case, one would optimize the perfor¬ 
mance by setting 


( cmax cminA (tj max tt min A 

w = W max = - _ =J -_ - 

Dg-Dg 

and the corresponding interval turns out to be a singleton, where 


r mi„ = pmax = Dg(g maX - U™*) + Dg(U^ - S mi ") . 

Dg-Dg 

Let S max - S min = p(U ma - x - U min ). Suppose \U max \ = \U min \. For efficient storage (X = 1), the sub¬ 
optimality bound is 

M = (1/2 )(Dg - Dg)(U max ) 2 = Dg-Dg ^ 

W (S^ax - 5min) _ ([/max _ [/min) 4(p _ 1) 

For fixed f/ max , as storage capacity increases , i.e., p — > oo, the sub-optimality (M/W) — > 0. If f7 max and 
5' max increases with their ratio p fixed, the bound increases linearly with f7 max . 

The remaining case A G (0,1) requires solving an optimization program to identify the bound-minimizing 
parameter pair (r, W). In the next result, we state a semidefinite program to find (r*, W*) that solves the 
following parameter optimization program 

P3-PO: minimize M(T)/W 

subject to r min < r < r max , o < w < w max . 


In the current form, this program appears to be non-convex. The next result reformulates P3-PO into a 
semidefinite program. Note that r min and r max are linear functions of W as defined in (18) and (19). 
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Lemma 4.8 (Semidefinite Reformulation of P3-PO). Let symmetric positive definite matrices X min ’ u , 
X min ’ s , and X max ’ s be defined as follows 


V [/(•) + (i - A)r' 

A ( ' ),s = 

V S^+T 

_ * 2 W 

1 

_* w 


where (•) can be either max or min, and r] u and r) s are auxilliary variables. Then P3-PO can be solved via 
the following semidefinite program 


minimize 
subject to 


r) u + A(1 - A)t? s 

pmin < p < pmax Q < W < VF maX , 
^min,iA x,n j£-min,s ^max^ q 


Proof. The result follows from Schur complement. See Appendix C for details. 


(23a) 

(23b) 

(23c) 

□ 


We close this section by discussing several implications of the performance theorem. 

Remark 4.9 (Optimality at the Fast-Acting Limit). Let the length of each time period be At. At the limit 
At —> 0, the online algorithm is optimal. Indeed , as discussed in Section 2, both |[/ min | and |[/ max | are linear 
in At, such that |C/ max | —> 0 and |t/ min | —> 0 as At —> 0. Meanwhile, A —> 1 as At —> 0. So by Remark f.l, 
it is easy to verify that the sub-optimality M/W converges to zero as At —> 0. 

Remark 4.10 (Operational Value of Storage and Percentage Cost Savings). Operational Value of Storage 
(VoS) is broadly defined as the savings in the long term system cost due to storage operation. Such an index 
is usually calculated by assuming storage is operated optimally. In stochastic environments, the optimal 
system cost with storage operation is hard to obtain in general settings. In our notations, let u NS denote the 
sequence {u(t) : u(t) = 0,f > 1} which corresponds to no storage operation. Then 


VoS = Jpi(m ns ) — Jp l5 


and it can be estimated by the interval 

J P1 (u NS )-J P1 (n*(P3)), Jfi(u ns )-Jfi(u*(P3)) + ^ . 

Additionally, for a storage operation sequence u, the percentage cost savings due to storage can then be defined 
by (J P i(u NS ) — J P i(u))/J P i(u NS ). An upper bound of this for any storage control policy can be obtained via 
(J P i(u NS ) — J P i(u*(P3)) + M/W) /J P i(u NS ), which to an extent summarizes the limit of a storage system 
in providing cost reduction. 


5 Numerical Experiments 

5.1 Single Storage Example 

We first test our algorithm in a simple setting where the analytical solution for the optimal control policy is 
available, so that the algorithm performance can be compared against the true optimal costs. We consider 
the problem of using a single energy storage to minimize the energy imbalance as studied in [12], where it is 
shown that greedy storage operation is optimal if A = 1 and if the following cost is considered 

g{t) = |<5(f) - (1 /n c )u + (t) + /i E V(f)|. 

As in [12], we specify storage parameters in per unit, and S min = 0. Let pP = pP = 1 so that the parame¬ 
terization of storage operation here is equivalent to that of [12]. We assume each time period represents an 
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hour, and —JJ min = 17 max = (l/10)S' max . In order to evaluate the performance, we simulate the S(t) process 
by drawing i.i.d. samples from zero-mean Laplace distribution, with standard deviation as = 0.149 per unit 
obtained from NREL data [12]. The time horizon for the simulation is chosen to be T = 1000. Figure 3(a) 
depicts the performance of the our algorithm and the optimal cost J^ >1 obtained from the greedy policy, 
where it is shown that the algorithm performance is near-optimal, and better than what the (worst-case) 
sub-optimality bound predicts. 6 

A slight modification of the cost function would render a problem which does not have an analytical 
solution. Consider the setting where only unsatisfied demand is penalized with a higher penalty during the 
day (7 am to 7 pm): 


3 (^)-(«+(t)/, c ) + A (*)) , ter Day , 

(<5(t) — {u + (t)/n c ) + n D u~(t)) , otherwise, 


(24) 


where T Day = {t > 1 : 7 < t mod 24 < 19}. We run the same set of tests above, with the modification 
that now /j c = /j D = 0.95. Note that the greedy policy is only a sub-optimal heuristic for this case. 
Figure 3(b) shows our online algorithm performs significantly better than the greedy algorithm. The costs 
of our algorithm together with the lower bounds give a narrow envelope for the optimal average cost Jp 1 in 
this setting, which can be used to evaluate the performance of other sub-optimal algorithms numerically. In 
both experiments, we also plot the costs of predictive/nominal storage control, whose solution can be shown 
to be u{t) = 0 for all t. Consequently, the costs of such operation rule are the same as the system costs when 
there is no storage. 



(a) (b) 


Figure 3: Performance of algorithms in a single bus network. Note that, different from the setup in Re¬ 
mark 4.7, we scale t/ min , f7 max together with 5' max in this and the following numerical examples. 

Figure 4 translates the results in Figure 3 into percentage cost savings due to storage operation, which 
are computed following the discussion in Remark 4.10. Using the cost of our online algorithm and the 

®By an abuse of notation, in this section, we use Jp , and Jpi to denote the results from simulation, which are estimates of 
the true expectations. 
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theoretical sub-optimality bound, we obtain an upper bound of percentage cost reduction of energy storage 
for any control policy (see black curve in each panel of Figure 4). It indicates the systemic limit of using 
storage to provide cost reduction, and is useful for system design considerations especially when the optimal 
cost cannot be calculated efficiently. 




(a) (b) 

Figure 4: Percentage cost savings of a single storage operated for balancing. 


5.2 Storage Network Example 

We consider a setting similar to that in the single storage numerical example, in which now distributed 
storages are coordinated to minimize the power imbalance over a tree network with N buses. We assume 
the storage network is homogeneous, i.e., the storage installed on each bus of the network has the same 
specifications and the same cost functions. Two cases with different cost functions are considered. In the 
first case, time homogeneous costs of the form 


9v(t) = (^o) - w + 9v u v > ( 25 ) 

are considered, where Py = p^ = 0.95, and S v (t) is i.i.d. following the same distribution as in the single 
storage example. In the second case, each bus has a cost function similar to (24): 


( 3 g*(t), t G T Day , 
otherwise, 

with g^(t) as defined in (25). We consider non-idealized storages which are operated frequently such that 
A„ = 0.999 for all v G V. As in the single storage example, we fix —(7™ ln = C/™ ax = (1/10)5™*“. The storages 
are connected in a star network, with N = 5 and F™ ax = as for each line e G E. The time horizon for the 
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simulation is chosen to be T = 1000. Figure 5 shows the percentage cost savings, where it is demonstrated 
that the online algorithm performs consistently superior to the greedy heuristic, and leads to percentage cost 
savings values that are close to the derived upper bound. Therefore near-optimal performance is achieved 
by our algorithm in both cases. 



(a) 


(b) 


Figure 5: Percentage cost savings of a storage network operated for balancing. 


6 Conclusions and Future Work 

This work is motivated by the fundamental question of how to optimally shift energy over space and time to 
achieve uncertainty reduction and to facilitate renewable integration. To this end, we consider the problem 
of optimal control of generalized storage networks under uncertainty. The notion of generalized storage is 
proposed as a dynamic model to capture many forms of storage conveniently. An online control strategy 
is then proposed to solve the corresponding constrained stochastic control problem, whose performance is 
analyzed in detail. We prove that the algorithm is near optimal, and its sub-optimality can be bounded by a 
constant that only depends on the parameters of the storage network and cost functions, and is independent 
of the realizations of uncertainties. 

Although we have provided analysis for a relatively general setting, potential improvements can be 
achieved in many directions, (i) Our formulation starts by minimizing the long run expected average cost, 
and lands on an online algorithm that has robust performance guarantees in the form of a sub-optimality 
bound that holds for all uncertainty realizations. Relaxing such requirements may result in approaches that 
trade risk with performance. Better performance guarantees (in terms of smaller sub-optimality) may hold 
with large probability (instead of with probability one), which leads to, in a sense, probably approximately 
correct (PAC) algorithms [26]. (ii) While our online control solves deterministic optimization respecting 
network constraints, the sub-optimality bound is derived independent of network topology and network 
parameters such as line capacities. Utilizing such information may lead to a tighter performance bound or a 
more informed choice of algorithmic parameters, (iii) It can be an advantage or a disadvantage that our online 
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algorithm does not use any statistical information about the uncertain parameters, depending on whether 
such information is readily available. Observe that our approach in fact can be generalized immediately 
to settings with additional same-stage variables which do not affect the (temporal) states of the system. 
Incorporating statistical information and forecast updates may improve the performance of the algorithm, 
and make the algorithm applicable to other settings where lookahead variables (such as wind farm contract 
level for the next stage) are considered together with storage operation, (iv) While the focus of this paper 
is on energy networks, the algorithm may be applied to other networks since our analysis does not rely on 
properties of the given constraints on network flow. This also implies that a more accurate AC power model 
can be used in this study as long as the online optimization can be solved efficiently. Recent advances in tight 
convex relaxation of AC optimal power flow [27] can be utilized for such purposes, (v) This paper provides 
a procedure to convert the hard stochastic problem to a sequence of easy deterministic problems which fit 
into today’s grid operational paradigms (especially for transmission grids operated by centralized system 
operators). For the future, the integration of distributed energy resources would require a decentralized 
solution to these problems. We note that many methods have been developed for distributed/decentralized 
deterministic optimization (c/. [28]); incorporating these methods for solving the online problems is an 
important future direction. 
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A Flow Representation of DC Power Flow 

To lighten the notation, we drop the dependence of the network parameters on time t as the discussion here 

applies to any time period. Let the incidence matrix A £ R mxn of the network G(V,E) be defined by 

{ 1 if e = (i,v) £ E for some v £ V, 

—1 if e = ( v,i ) £ E for some v £ V, (26) 

0 otherwise. 

Let ft > 0 be the admittance of the line e £ E, and (3 £ R m be the vector {/3 e } e6 £ . Then matrix A is 
related to the F-bus matrix such that F = A T diag(/3)A, where diag(/3) is the diagonal matrix with (3 e , e £ £ 
on its diagonal. Under DC power flow assumptions, the line flows are linearly related to bus phase angles. 
Let 9 V be the phase angle on bus v. Then it is easy to see that 

f = diag(/3)A0, (27) 

and the bus flow injection are given by A 1 f. Equation (27) states that / lives in the range of matrix 
H — diag(/3)A, i.e., f £ 1Z(H). By the fact that the graph is connected, m > n — 1. Provided that 
rank(A) = n — 1, and diag(/3) is full rank, we have rank(iL) = n — 1. Then dim(7?.(i7)) = n — 1 and 
thus the dimension of the nullspace of H T is m — n + 1, i.e., dim (A f(H J )) = m — n + 1. Let the rows of 
K £ b e a basis for Af(H T ) (which can be obtained via e.g., singular value decomposition of 

H). Then 1Z(K J ) = Af(H T ) => J\f(K) = TZ(H), and therefore 

f = HO Ki = 0. 
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B Structural Properties of the Online Optimization 

We consider replacing the g(t) defined in (3) with an extended real-valued function 

g(t) = <t> t (u(t),K,(t)), (28) 

where n(t) is a vector of auxiliary parameters that captures both stochastic parameters and deterministic 
parameters, and it is supported on a compact set Ck ■ Observe that this would make our analysis applicable 
to general cost functions. Similar to discussions in Section 4, we are interested in solving the following 
optimization in each time period t for u(t) 

minimize A s(t)u(t) + W<j>t{u{t ), n{t)) (29a) 

subject to U min < u(t) < 17 max , (29b) 

after observing the realization of /c(i). 

Lemma B.l (Structural Properties of Single Bus Online Optimization). For an extended real valued function 
f>t(u{t), /«(£)), let (f> K (u) = (j> t (u, k), where n is equal to the observed value of n(t). The following statements 
hold, regardless of the realizations of n(t). 

1. if Xs(t) + WD(j) K (u) > 0, then u°\t) = U min ; 

2. if Xs(t) + ffif(tt) < 0, then u ol (t) = C/ max . 

Here, 


Df> K (u) = inf £ <9<^> K (u) 
D(j) K (u ) = sup l £ £ d<j) K (u) 


u £ [{7 min , [7 max ], 
n £ Ck 

u £ [C/ min , C/ max ], 
£ Ck 


and d(j) Kj (u) is the sub-differential of (jf> K (w) 7 . 

Proof. To show the set of sufficient conditions for u° l (t) takes U max (or U min ), notice that the condition 

As(t) < -WDj) K (u) 

implies dJf’ s (u) C (—oo, 0] for any u(t), n(t), s(t). Thus, for every given u £ [U min , [/ max ], if /3 is a constant 
such that 

Jf’>) - J t K ’ s (u) >/3-(v-u), Vve [L/ min , C/ max ], 

then the sub-differential condition implies that /? < 0. Now, by substituting u = [/ max in the above 
expression, one obtains /? • (v — u) >0 and Jf" s (v) > J t K ’ s ([/ max ), for all v £ [£/ min , U max ]. Therefore, one 
concludes that u[t) = t/ max attains an optimal solution in problem (29). Similarly, the condition 

A S(t) > —WD<f K (u) 

implies dJf’ s (u(f)) C [0, oo). Based on analogous arguments, one concludes that u(t) = U min attains an 
optimal solution in problem (29). □ 

' We say ft is a sub-derivative of an extended real-valued function O at. uq fE lit i f 3 ■ (u — uf) < <j>{u) — (j>(u o), for any u £ ]R, 
and denote the set of all such /3, namely the sub-differential of <j> at uq, by d(f)(uo). 
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C Proof of Single Bus Results 

Proof of Theorem 4.5 We first validate that the intervals of T and W are non-empty. Note that from 
Assumption 3.1, W max > 0, thus it remains to show r max > r min . Based on (20), W > 0, and Dg > Dg, 
one obtains 


W(Dg - Dg) <A(S max - S min ) - ([/ max - (1 - A)S ,max )+ 
— ((1 - A)S™ in — U min ) + . 


Re-arranging terms results in 


< 


-WDg + ([/ max — (i _ A)5 max ) + 
—WDg - ((1 - A )S min - u min ) + ~ 


- XS max 

- AS" 11111 


which further implies r max > r mm . 
We proceed to show that 


S min < a (t)< S n 


(30) 


for t = 1,2,..., when w*(P3) is implemented. The base case holds by assumption. Let the inductive 
hypothesis be that (30) holds at time t. The storage level at t + 1 is then s(t + 1) = As(f) + u ol (f). We show 
(30) holds at t + 1 by considering the following three cases. 

Case 1. -WDg < A S(t) < A(S max +T). 

First, it is easy to verify that the above interval for A s(t) is non-empty using (18) and T > r min . Next, based 
on Lemma 4.3, one obtains u ol (t) = [/ min < 0 in this case. Therefore 


a(t + 1) = As(£) + t/ min < A5 m “ + U min < S max , 


where the last inequality follows from Assumption 2.1. On the other hand, 

s(t + 1) = A s(t) + U min > -WDg -XT + U min 

> - WDg - Ar max + [7 min 

> ((1 - A)5 min - U min ) + + XS min + U min > S n 


where the third line used Dg > Dg. 

Case 2. A (S min + T) < As(£) < -WDg. 

The above interval for A s(t) is non-empty by (19) and T < r max . Lemma 4.3 implies u ol (t) = U max > 0 in 
this case. Therefore, using Assumption 2.1, 


s(t + 1) = A a(t) + U max 


> XS min + U n 


> S 


min 


On the other hand, 


s(t + 1) = A s(t) + U max < -WDg -XT + U max 
< - WDg - Ar min + U max 

(1 - A)5 max ) + +AS' max + U max < S n 


where the third line again is by Dg > Dg. 
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Case 3. -WDg < A §{t) < -WD_q. 
By t/ mm < u ol (t) < t/ max , one obtains 


s(t + 1) = A s(t) + u°\t ) < A a(t) + U max 

< - WDg - XT + U max 

< - WDg - Ar min + U max 

_^^y-max_ ^ _ y^gmax^ + _j_ y^gmax _|_ £jmax < -' ^imax 


On the other hand, 


a(t + 1) = A a(t) + u°\t ) > A a(t) + U min 

> - WDg - XT + U min 

> - WDg - Ar max + t/ max 


> ((1 - A)S’ 111 * 11 - U min ) + +XS min + U min > S n 


Combining these three cases, and by mathematical induction, we conclude (30) holds for all t = 1,2,.... 


Proof of Theorem D.4 Consider a quadratic Lyapunov function L(s) = s 2 / 2. Let the corresponding 
Lyapunov drift be 

A(s(i)) = E [L(a(t + 1)) - L(s(t))\s{t)]. 

Recall that s(t + 1) = s(t + 1) + L = A s(t) + u(t) + (1 — A)r, and so 

A (a(t)) = E[(l/2)(«(*) + (1 - A)r) 2 - (1/2)(1 - A 2 )s(t) 2 
+ A s(t)u(t) + A(1 — A)s(f)r|s(f)] 

< M“(r) — (1/2)(1 — A 2 )s(t) 2 

+ E[As(t)u(t) + A(1 — A)s(t)r|s(t)] 

< M U {T) + E [As(i)(u(t) + (1 - A)r)|5(t)]. (31) 


It follows that, with arbitrary storage operation u(t), 


A(S(t)) + WE[g(t)|S(i)] (32) 

<M“(T) + A(1 - X)s(t)T + E[Jp 3 ,t(u(i))|a(i)], 

where it is clear that minimizing the right hand side of the above inequality over u(t) is equivalent to 
minimizing the objective of P3. Given that w stat (f), the disturbance-only stationary policy of P2 described 
in Lemma 4.2, is feasible for P3, the above inequality implies 

A (s(t)) + WM[g(t)\s(t), u(t) = u°\t )] (33) 

<M U (T) + A(1 - A)S(t)r +E[^ 3it |S(t)] 

<M U (T) + A(1 - \)s(t)T + E[Jp 3 , t (n stat (t))|S(t)] 

( = } M“(r) + A5(t)E [u stat (f) + (1 - A)r] + WTE[g(t)\u stat (t)} 

( b ) (c) 

<M(T) + WE[g(t)\u stat {t)} < M{T) + WJ^. 

Here (a) uses the fact that u stat (t) is induced by a disturbance-only stationary policy; ( b ) follows from 
inequalities | s(t) | < (max ((5' max + T) 2 , (5' min + r) 2 )) 1 ^ and |E [u stat (f)] + (1 — A)F| < (1 — A)(max((S' max + 
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T) 2 , (S™ 111 -f F) 2 )) 1 / 2 ; and (c) used ELg(f)|u stat (f)] = Jp 2 in Lemma 4.2 and Jp 2 < Jp X . Taking expectation 
over s(t) on both sides gives 

E [. L{s(t + 1)) - L(s(t))] + WE [g(t)\u(t) = u ol (f)] 

<M{Y) + WJf> 1 . (34) 

Summing expression (34) over t from 1 to T, dividing both sides by WT, and taking the limit T —> oo, we 
obtain the performance bound in expression (21). 

Proof of Lemma 4.8 Based on the re-parametrization r) u = M U (T)/W, g s = M S (T)/W and W > 0, one 
can easily show that problem P3-PO has a same solution as the following optimization problem: 

minimize rj u + A(1 — A )p s 

subject to r min < r < r max , o < w < ir max , 

2 r/ u W> (U min + ( 1-A)T)\ 

2r/ u W > (U max + (1 - A)T) 2 , 

rfw > (s min + r) 2 ,r/ s w > (s ,max + r) 2 . 

The proof is completed by applying Schur complement on the last four constraints of the above optimization. 


D Proof for Networked Systems 


In this section, we generalize the analysis in Section 4 to the network case. First we have the following 
assumption on S v (t) and p v (t, l ), for v £ V, l £ {1,..., L}. 

Assumption D.l. We assume in this section that at each vertex v £ V, the imbalance process {<5„(t) : t > 1} 
and the process {p v (t, £) : t > 1}, l £ (1,..., L}, follow Assumption 4.1. 

Similar to the single bus system case, we define optimization problem PI and P2 for the networked 
system, where the state updates on storage levels and the constraints on storage operations are defined on 
every vertices v £ V. Furthermore, the convex constraint of the network flow is also added to each problem. 
Also, we define the following vector notations for the networked storage levels, storage operations, shift 
parameters and shifted storage levels: 

s(f) = {s„(t)}„ e y, u (t) = {u v (t)} veV , r = {r^(t)}„ ey , 
s(t) = s(f) + r. 


From the theory of stochastic network optimization [23], the following result holds. 

Lemma D.2 (Optimality of Stationary Disturbance-Only Policies). Under Assumption D.l there exists a 
stationary disturbance-only policy (u stat (f), f stat (i)) satisfying the constraints in P2, providing the following 
guarantees for all t: 


(1 - A„)S™ in < EK tat (i)] < (1 - A„)S“ ax ,\/u e v 


E 


\ ^ stat 


.vev 




— J-l 


P27 


where the expectation is taken over the randomization of 5 v (t), p v (t,£), i = 1,... ,L, u v (t), f e (t) and 


gT\t) = ^Pv{t,i)(at{£)5 v {t)- a C{i)h°( K tat (t)) + ) 
1=1 

+a»(e)h» (« tat (t)) _ ) +^(Crw+“» offlt (M)) + 
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Recall the online optimization in expression (8). By using (u ol (f), f ol (i)) to denote the minimizers at 
each time step: 

f °\t) 4 { f?(t )} eeE , u °\t) 4 {u°\t)} veV , 

u*(P3) and f*(P3) to denote the sequence (u ol (f) : t > 1}, {f ol (t) : t > 1} respectively and Jp 3 ,t(u(t), f(t)) 
to denote the objective function of P3 at time period t. In the rest of this section, we will analyze feasibility 
conditions and performance bounds for the Lyapunov optimization algorithm in networked system. Before 
getting into the feasibility analysis, on each vertex v £ V we define the following bounds: F™ m , F™ ax and 
W™ ax , by their single bus system counterparts. 

Theorem D.3 (Feasibility). Suppose the initial storage level satisfies s„(l) £ [S'™ 111 ,S'™ 351 ], for all v £ V, 
then the storage level sequence (s(t) : t > 1} induced by the sequence of storage operation (u*(P3)) is feasible 
with respect to storage level constraints, i.e., s v (t) £ [S™ ln ,S™ ax ] for all t and v £ V, provided that 

r™™ < r„ < r™ ax , o <w v < VF™ ax , 


for all v £ V. 

Proof. Based on analogous arguments in Theorem 4.5, one can easily validate that the intervals of r„ and 
W v . Vi> £ V, are non-empty, noting that 

W v (pg v -Dg v ) <A,(S™ ax - S™ in )-(f/” ax - (l-A„)S™ ax ) + 

- ((1 - A„)S™™ - C/™™)+, V« £ V. 

For the feasibility argument on s v (t), Vv £ V, when f(f) is any fixed quantity, one can show s v (t) £ 
[S™ ln , S™ ax ] by applying Theorem 4.5 to each vertex v £ V. Since this feasibility result is uniform in 
variable f(f) (both Dg v and Dg„ are independent of f (t), and f(t) does not explicitly affect the storage level 
dynamics), the proof is completed by substituting f(f) = f ol (t), Vt. □ 

The following theorem provides a performance bound for Lyapunov optimization in networked system. 
On each vertex v £ V, we also define the following quantities: M“(r„), Mf(F v ) by their single bus system 
counterparts. 

Theorem D.4 (Performance). The sub-optimality of control sequence (u*(P3), f*(P3)) is bounded by 
M(T)/W, that is 

Jpi < Jpi(u*(P3), f*(P3)) < Jp! + 

vEV 

where 

M V {T V ) = Af“(r„) + A„(l - \ V )M S V (T V ). 

Proof. Consider a quadratic Lyapunov function L v (s v ) = s„/ 2. Let the corresponding Lyapunov drift be 

A v (s v (t)) = E [Ly(s v {t + 1)) - Ly(s v {t))\s v (t)] . 

Similar to the analysis in expression (31), one obtains 

A(s(t))<^M“(r)+E[A^(t)K(t) + (l-A„)r v )|s w (t)]. 

vev 

It follows that, with arbitrary storage operation u (t) and network flow f(t), 


A(s(t)) + WE 


9v(t)\s(t) 

.vev 


< M U (T)+ 


y A„(l - A„)s„(t)r„ + E[J P 3 jt (u(t),f(t))|s(t)], 

vGV 
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where it is clear that minimizing the right hand side of the above inequality over (u(£),f(f)) is equivalent 
to minimizing the objective of P3. Since (u stat (£),f stat (£)), the disturbance-only stationary policy of P2 
described in Lemma D.2, is feasible for P3, similar to the analysis in expression (33), the above inequality 
implies 


VFE 




.veV 


+A(S (£)) 


<M U (T) +J2 A,(l - A V )S V (t)T v + E [ J P3 ,t (u stat (£),f stat (£))] 

v£V 

<M(T) + WJp 1 . 


Taking expectation over §(f) on both sides gives 


TVE 


.vev 


<M{T) + WJp ± 


+ E [L(s(t + 1)) - £(§(£))] 


(36) 


Summing expression (36) over t from 1 to T, dividing both sides by WT, and taking the limit T —> oo, we 
obtain the performance bound in expression (D.4). □ 

The semidefinite program for minimizing the bound is as follows. 

Lemma D.5 (Semidefinite Optimization of M(T)/W). Let symmetric positive definite matrices X™ ln ’ u , 
X ma X , U) x“ in > s , and X“ ax - S , v £ V, be defined as follows 



+ (1 - A„)r„' 

2 W 



:(■) 


W 


where (•) can be either max or min, and rff and 77 ® are auxiliary variables, for any v £ V. Then the 
sub-optimality bound M(T)/W can be optimized by solving the following semidefinite program 


minimize 
subject to 


vGV 


Vv + A„(l - K)v s v 


0 < W < lT, max , 

Y" min, it y' max ,i* x^min,s ymax,s 


to, 


where constraints (37b)-(37d) hold for all v £ V. 

Proof. The proof is similar to the proof of Lemma 4.8 and is omitted here. 


(37a) 

(37b) 

(37c) 

(37d) 


□ 


E Generalization to Markov Cases 

Under Assumption 3.2, Oft) is some deterministic function of the system stochastic state w(t), where ui(t) is 
a finite state ergodic Markov Chain, supported on fh Let wq 6 H be the initial state of w(t). Since ui(t) is 
an ergodic Markov chain, there exists a sequence of finite random return time T r , for r £ N, such that u>(T r ) 
revisits ujq for the r-th time at time T r . Define N(t) as the number of visits of ujq at time t. Specifically, 
N(t) = max{r : T r < £}. From this sequence of return times, we define the r—th epoch as [T r , T r+1 — 1] and 
the length of this epoch is defined as ATj. = T r+ 1 — T r . From the definition of a return time in a Markov 
chain, it is obvious that the sequence of {AT r } r6 N is i.i.d.. Let AT be a random variable with the (common) 
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time distribution of AT r , Vr. The positive recurrence assumption implies that E[AT] < oo. We assume 
that the second moment of AT is bounded: E [(AT) 2 ] < oo and define the mean return rate of state Wq as 
U = 1/E[AT], 

For the feasibility analysis, the result directly follows from Theorem 4.5, as P3 is a deterministic online 
optimization problem. Next, we turn to the performance analysis of the Lyapunov optimization algorithm. 

Theorem E.l (Performance). The sub-optimality of storage operation w*(P3) is bounded by M(T)/W 
almost surely, that is 

Jpi < Jpi(u*(P3)) < + M{T)/W (38) 

almost surely, where 


M{T) = M S (T) + 1ZE [M t (T, AT)], 

M U {T) =imax (([/ min + (1 - A)T) 2 , (f/ max + (1 - A)T) 2 ), 

M S (T) = A(1 - A) max ((S min + T) 2 , (S max + T) 2 ) , 

M t (T,T) = M“(T)(2T 2 + T), 

Proof. Similar to the case with i.i.d. assumptions, consider a quadratic Lyapunov function T(s) = s 2 / 2. Let 
the corresponding Lyapunov drift be 


A(S(t)) = E [L(s(t + 1)) - L(s(t))\s(t)}. 

Based on the analysis in expression (31), one obtains 

A(S(i)) < M“(T) + E [Aa(t)(u(i) + (1 - A)T)|5(i)]. 

By substituting t = T r , and by a telescoping sum, it follows that, with arbitrary storage operation u(t), 


E 


<E 


Tr+l-l 


L(s(T r+1 ))-L(~s(T r )) + W J2 9 (r)\~s(Tr ) 

r=T, 

T r +i — l 

\s(T)u(T) + Wg{T)\s{T r ) 


T=T r 


M“(T)E[AT r .|s(T r .)]+E 


Tr+l-l 


E A5(r)(l-A)T)|S(r r ) 


r=T r 


It is clear that minimizing the right hand side of the above expression over u(t) is equivalent to minimizing the 
objective of P3. Given that u stat (r), the disturbance-only stationary policy of P2 described in Lemma 4.2, 
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is feasible for P3. Thus, this expression implies 


E 


Tr+l-l 


<E 


<E 


t=T t 
Tr +1 — 1 


E A (s( r )) + w 9 {'r)\s{T r ) 1 u(t) = u°\t), Vr 

-=T r 
"+i—1 

E M“(r) + A(l-A)s(r)r +Jp 3 , r |s(T r ) 

T—T r 

E Jp 3 ,.(« stat (r))|S(T r ) 

n+l —1 

E M“(r) + A(1 - A)s(r)T|s(T r ) 

r— T r 

n+l-1 

E » stat ( r ) 

r=T r 

r +1 —1 

+ E E M “( r ) + As(r)( U stat (r) + (1 - A)T)|s(T r ) 


E 


=WdE 


r=T r 


(39) 


where the first inequality follows from the definition of Jp 3 jT and the first equality follows from the facts 
that {u stat (T)E +1 1 is a sequence of disturbance-only control polices and each epoch: [T r ,T r+ i — 1], Vr, is 
independent and identically distributed. Based on the definitions of s(r) and M“(T), one obtains |(M stat (r) + 
(1 - A)T)| < a /2M“(T), Vt and 

Tr+l-l T r+ i-l 

El(»(r)-5(T r ))|<E 

T —T r T=T r 

T r + i — l r —1 

< E E l(«(j') + ( 1 -A) r )| 

r=r r j=r r 

_Tr+l-l 

<y/2M u (T) E AT r < V 2M “( r ) AT r- 

r—T r 


t — 1 

E ; 

j=TV 


A^- 1 ( U (i) + (l-A)T) 


By the Renewal Cost Theorem (Theorem 3.6.1, [29]), with {A T r , X)r3r r 1 g stat (r)} and {AT r , X)r3r r 1 
Vr, forming two i.i.d. processes, one obtains 



Tr+ 1 — 1 

= lim -E 

£—>oo t 

~N(t) — l T r+1 - 

1 

1ZE 

E 3 stat (r) 

E 

E 

<? stat (r) 


r=T r 

r=0 

r=T r 

- 



= lim -E 
£—> oo t 

£ 

E/ ta ‘(r) 

.r— 1 

5 


Tr+l-l 

= lim -E 

£—>•00 t 

"iV(£)-lT r+1 - 

1 

KE 

E uSta V) 

E 

E 

w stat (r) 


r=T r 


r=0 

r=T r 

- 



= lim -E 

£—>oo t 

£ 

E“ stat M 



,T = 1 


^stat 


(r)}, 
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Since the epoch duration A T r is independent of the shifted queueing state s(T r ), based on the definition of 
Mt (r, AT r ), one obtains 

E [Mt{T, AT r )\s(T r )] = E [Mr(r, AT r )]. 

Thus, by combining all these arguments, expression (39) becomes 


E 


T r . 4-1 — 1 


E A (®( r )) + w 9(T)\s(T r ),u(T ) = u o 1 (t), Vt 

r =T r 

1 * 

<TTE[AT r ] lirn - E [ 5 stat (r)] + E [2M“(r)AT 2 ] 


+ As(T r )E[AT r ] lim -E 

\ t—yoo t 


E 

T—1 


M stat (r) 


+ (i - A)r 


Furthermore, recall that epoch duration AT r is independent and identically distributed. Therefore, there 
exists a random variable AT with the common inter-arrival time distribution, such that 


E [AT 2 ] 


E 


(AT) 5 


E[AT r ] = E [AT], Vr. 


Note that liirq^^ 7 Y^t=i E [g stat (T)] = </p 2 A Jpi- By taking expectation in the above expression with 
respect to s(T r ), one obtains 


E 


Tr+1~ 1 

A(s(r)) + Wg(r)\u(r) = w ol (r), Vt 


r=T r 

<E [M t (T, AT r )] + M S (T)E [AT r ] + WM[AT r ] 


By a telescoping sum over r = 0,..., N(t) — 1, and notice that 


Tq — 1, T r+ i — T r + AT r , Vr > 0, 


the above expression implies 


E 


iAT(t) 


E A (*( r )) + Wg(T)\u(r) = u°\t), Vt 


< 


N(t) (E [M t (T, AT)] + (M S (T) + IF4)E [AT]). 


(40) 


Dividing both sides the above expression by lV(i)E [AT], and recalling the definition of _Ad(T), the above 
expression becomes 


W 

N(t)E,[AT] 


E 


'»(t) 


E s( t )I“ o1 ( t )> Vr 


T = 1 


[L,s(T mil » - L(s(0))l 
<M(r) + wjp!. 


(41) 


From the assumption of Lyapunov function, we know that for any N(t) £ N, L(s(T N ( t ))) > 0 and 0 < 
L(s( 0)) < 00 . Also, recall t r —> 00 as r —> 00 and N(t) = max{r : T r < t}. Then, as t —>• 00 , we get 
N(t) —> 00 . This implies that 

M[L(s(T N{t) ))-L(s(0))] y E [L(s(T N{t) ))] 
ti™ A(t)E[AT] t-5S> JV(t)E[AT] 
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which is a positive quantity. By taking the limit on i -> oo, and dividing both sides by W € (0, W max ], 
expression (41) implies 


E 


lim ■ 

t — > OO 


Er=lff( r ) -Er=T Nm + l9( T )\ u (t)’ Vt 


<m +4l . 


(42) 


A(t)E[AT] - W 

Now, since T NW + l<t< T N ( t )+i, and T N ^ +1 - T N{f) = A T NW , by letting 


^max _ max / \g(t)\ 


u{t) G [J7 min ,C/ max ], 
p(t,£) G [p min (£),p max (Q], W, 
6{t) G [5 min ,5 max ], 
f(t) G T 


and noting that E [AT], E [(AT) 2 ] < oo and |g(t)| < g max for each time slot t, one easily obtains 

1 


0 < lim - , , r . , 

—t->oo _/V(t)E[AT] 


E 5Z 9(t)\u°\t), Vt 

T=TjV( t)+l 

< E[Ar f _ g -"E[Arl ; 

- t-voo A(t)E[AT] t->oo 7V(f)E[AT] 

The last equality is due to the fact that N(t) —> oo when t —> oo. Next, recall from the Elementary Renewal 
Theory (Theorem 3.3.4, [29]) that 


lim 


t 


= 1 almost surely. 


t-> oo iV(i)E[AT] 

By combining all previous arguments, one further obtains the following expression: 

~t-1 


lim -E 

£->-oo t 


5>(t)|«(t) = MOl ( T )- Vt 


T — l 


- w +Jm 


almost surely. Finally, from the definition of Jpi(w*(P3)), the above inequality implies expression (38). □ 
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